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Abstract. Recent research has indicated that social networking sites are being 
adopted as venues for online information-seeking. In order to understand 
questioner’s intention in social Q&A environments and to better facilitate such 
behaviors, we define two types of questions: subjective information-seeking 
questions and objective information seeking ones. To enable automatic detec¬ 
tion on question subjectivity, we propose a predictive model that can accurately 
distinguish between the two classes of questions. By applying the classifier on a 
larger dataset, we present a comprehensive analysis to compare questions with 
subjective and objective orientations, in terms of their length, response speed, 
as well as the characteristics of their respondents. We find that the two types of 
questions exhibited very different characteristics. Also, we noticed that question 
subjectivity plays a significant role in attracting responses from strangers. Our 
results validate the expected benefits of differentiating questions according to 
their subjectivity orientations, and provide valuable insights for future design 
and development of tools that can assist the information seeking process under 
social context. 
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1 Introduction 

As understanding the information needs of users is crucial for designing and develop¬ 
ing tools to support their social question and answering (social Q&A) behaviors, 
many of the past studies analyzed the topics and types of questions asked on social 
platforms [1-3]. With a similar aim in view, in this work, we also study the intentions 
of questioners in social Q&A, but we focus more specifically on identifying the 
subjectivity orientation of a question. In other words, we build a framework to differ¬ 
entiate the objective questions from the subjective ones. We believe this kind of sub¬ 
jectivity analysis can be very important in social Q&A due to several reasons: First, as 
previous studies suggested that both factual and recommendation/opinion seeking 
questions were asked on social platforms, our study allows people to automatically 
detect the underlying user intent behind any question, and thus provide more appro¬ 
priate answers. More specifically, we assume that objective questions focus more on 
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the accuracy of their responses, while subjective questions require more diverse re¬ 
plies that rely on opinion and experience. Second, we believe that our work can serve 
as the first step in implementing an automatic question routing system in social con¬ 
text. By automatically distinguishing subjective questions from the objective ones, we 
could ultimately build a question routing mechanism that can direct a question to its 
potential answerers according to its underlying intent. For instance, given a subjective 
question, we could route it to someone who shares about the same experience or 
knows the context well to provide more personalized responses, while for an objective 
question, we could contact a selected set of strangers based on their expertise or could 
submit it to submit it to search engines. 

From the above viewpoint, we carry out our subjective analysis on Twitter. We 
implement and evaluate multiple classification algorithms with the combination of 
lexical, part-of-speech tagging, contextual and Twitter-specific features. With the 
classifier on question subjectivity, we also conduct a comprehensive analysis on how 
subjective and objective question differs in terms of their length, posting time, re¬ 
sponse speed, as well as the characteristics of their respondents. We show that subjec¬ 
tive questions contain more contextual information, and are being asked more during 
working hours. Compared to the subjective information-seeking tweets, objective 
questions tend to experience a shorter time-lag between posting and receiving re¬ 
sponses. Moreover, we also notice that subjective questions attract more responses 
from strangers than objective ones. 


2 Related Work 

As an emerging concept, social Q&A has been given very high expectations due to its 
potential as an alternative to traditional information-seeking tools. Jansen et al. [4] in 
their work examining Twitter as a mechanism for word-of-mouth advertising reported 
that 11.1% of the brand-related tweets were information-providing, while 18.1% were 
information-seeking . Morris et al. [1] manually labeled a set of questions posted on 
social networking platforms and identified 8 question types in social Q&A, including: 
recommendation, opinion, factual knowledge rhetorical, invitation, favor, social con¬ 
nection and offer. Zhao and Mei [5] classified question tweets into two categories: 
tweets conveying information needs and tweets not conveying information needs. 
Harper et al. [6] automatically classified questions into conversational and informa¬ 
tional, and reached an accuracy of 89.7% in their experiments. 

As for the task of question subjectivity identification, Li et al. [7] explored a super¬ 
vised learning algorithm utilizing features from both the perspectives of questions and 
answers to predict the subjectivity of a question. Zhou et al. [8] automatically collect 
training data based on social signals, such as like, vote, answer number, etc, in CQA 
sites. Chen et al. [9] built a predictive model based on both textual and meta features, 
and co-training them to classify questions into: subjective, objective, and social. 
Aikawa et al. [10] employed a supervised approach in detecting Japanese subjective 
questions in Yahoo IChiebukuro and evaluated the classification results using weighed 
accuracy which reflected the confidence of annotation. 
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Although a number of works exist on question subjectivity detection, none of them 
are conducted within social context. Considering the social nature of Q&A on SNS, 
we present this study, focusing on comparing objective and subjective questions in 
social Q&A, and propose the overarching research question of this study: 

How subjective and objective information-seeking questions differ in the way they 
are being asked and answered? 

To measure the difference, we first propose an approach which can automatically 
distinguish objective questions from subjective ones using machine learning tech¬ 
niques. In addition, we introduce metrics to examine each type of question. 


3 Annotation Method 

To guide the annotation process, we in this section present the annotation criteria 
adopted for identifying the subjective and objective questions in social context. 

Subjective Information-Seeking Tweet. The intent of a subjective information- 
seeking tweet is to receive responses reflecting the answerer’s personal opinions, 
advices, preferences, or experiences. A subjective information-seeking tweet is usual¬ 
ly with a “survey” purpose, which encourages the audience to provide their personal 
answers. 

Objective Information-Seeking Tweet. The intent of an objective information¬ 
seeking tweet is to receive answers based on some factual knowledge or common 
experiences. The purpose of an objective question is to receive one or more correct 
answers, instead of responses based on the answerer’s personal experience. 

Considering that not all questions on Twitter are of information-seeking purpose, 
in our annotation criteria we also adopted the taxonomy of information-seeking and 
non-information-seeking tweets from [10], although differentiating these two types 
are not of our interest in this study. 

To better illustrate our annotation criteria used in this study, in Table 1 we listed a 
number of sample questions with subjective, objective or non- information-seeking 
intents. 


Table 1. Subjectivity categories used for annotation 


Question Type 

Sample Questions 

Subjective 

• Can anyone recommend a decent electric toothbrush? 

• How does the rest of the first season compare to the pilot? 
Same? Better? Worse? 

Objective 

• When is the debate on UK time? 

• Mac question. If I want to print a doc to a color printer but in 
B&W how do I do it? 

Non-information 

• Why is school so early in the mornings? 

• There are 853 licensed gun dealers in Phoenix alone. Does 
that sound like Obama's taking away gun rights? 
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Given the low percentage of information-seeking questions on Twitter [11], to 
save our annotator’s time and effort, in this study, we crawled question tweets from 
Replyz (www.replyz.com). Replyz is a very popular Twitter-based Q&A site, which 
searches through Twitter in real time looking for posts that contain questions based on 
their own algorithm (Replyz has been shut down on 31 July, 2014). By collecting 
questions through Replyz, we filtered out a large number of non-interrogative tweets. 

For our data collection, we employed a snowball sampling approach. To be more 
specific, we started with the top 10 contributors who have signed in Replyz with their 
Twitter account as listed on Replyz’s leaderboard. For each of these users, we crawled 
all the question tweets that they have answered in the past from their Replyz profile. 
Then, we identified the individuals who posted those collected questions and went to 
their profile to crawl all the interrogative tweets that they have ever responded. We 
repeated this process until each “seed” user yielded at least 1,000 other unique ac¬ 
counts. After removing non-Twitter questioners in our collection, in total, we crawled 
25,697 question tweets and 271,821 answers from 10,101 unique questioners and 
148,639 unique answerers. 

We randomly sampled 3,000 English questions from our collection and recruited 
two human annotators to work on the labeling task based on our annotation criteria on 
subjective, objective and non-information-seeking tweets. Finally, 2,588 out of 3,000 
questions (86.27%) received agreement on their subjectivity orientation from the two 
coders. Among the 2,588 interrogative tweets, 24 (0.93%) were labeled as with mix 
intent, 1,303 (50.35%) were annotated as non-information seeking, 536 (20.71%) as 
subjective information seeking, and the rest 725 (28.01%) as objective information 
seeking. Our Cohen’s kappa is quite high at 0.75. 

4 Question Subjectivity Detection 

4.1 Feature Engineering 

In this section, features extracted for the purpose of question subjectivity detection are 
introduced. In total, we have identified features from four different aspects, including: 
lexical, POS tagging, context and Twitter-specific features. 

Lexical Features: we adopted word-level n-gram features. We counted the frequen¬ 
cies of all unigram, bigram, and trigram tokens that appeared in the training data. 
Before feature extraction, we lowercase and stemmed all the tokens using the Porter 
stemmer [12]. 

POS Tagging Features: In addition to the lexical features, we also believed that POS 
tagging can add more context to the words used in the interrogative tweets. To tag the 
POS of each tweet, we used the Stanford tagger [13]. 

Syntactic Features: The syntactic features describe the format or the context of a 
subjective or objective information-seeking tweet. The syntactic features that we 
adopted in this study include: the length of the tweet, number of clauses/sentences in 
the tweet, whether or not there is a question mark in the middle of the tweet, whether 
or not there are consecutive capital letters in the tweet. 
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Contextual Features: We assume that contextual features, such as URL, hashtag, 
etc., can provide extra signals for determining whether a question is subjective or 
objective. The contextual features that we adopted in this study are: whether or not a 
question tweet contains a hashtag, a mention, a URL, and an emoticon. 

For both lexical and POS tagging features, we discarded rare terms with observed 
frequencies of less than 5 to reduce the sparsity of the data. 


4.2 Classification Evaluation 

We next built a binary classifier to automatically label subjective and objective in¬ 
formation-seeking questions. We tested our model using a number of classification 
algorithms implemented in Weka, including: Naive Bayes, LibSVM, and SMO, using 
10-fold cross-validation. We only reported the best results obtained. 

First, we evaluated the classification accuracies along with the number of features 
selected using the algorithm of information gain as mentioned above. We noticed that 
all three algorithms attained high accuracies when the number of features selected 
equaled to about 200. Next, based on the 200 features selected, we accessed the clas¬ 
sification performances based on the evaluation metrics provided by Weka, including 
accuracy, precision, recall, and F-measure. The majority induction algorithm, which 
simply predicts the majority class, was applied to determine the baseline performance 
of our classifier. Table 2 demonstrated the classification results. 


Table 2. Classification results using the top 500 selected features 


Method 

Accuracy 

Precision 

Recall 

FI 

NaiveBayes 

80.12 

83.46 

22.16 

35.02 

LibSVM 

76.17 

90.58 

43.47 

58.75 

SMO 

81.65 

87.66 

26.63 

40.85 


5 Impact of Question Subjectivity on User Behavior 

In this section, we address our research goal by understanding the impact of question 
subjectivity on individual’s asking and answering behaviors in social Q&A. In order 
to do that, we first need to identify the subjectivity orientation of all 25,697 collected 
questions. Flowever, as we built our classification model as a further step of providing 
subjectivity indication only after a question has been predetermined as informational, 
we can’t directly apply it to the entire data set. So, to solve this challenge, we adopted 
the text classifier as proposed in [5] and [11] to eliminated all non-information- 
seeking tweets first. With the adopted method, we achieved a classification accuracy 
of 81.66%. We believe this result reasonable comparing to the 86.6% accuracy re¬ 
ported in [5], as Replyz has already removed a huge number of non-informational 
questions based on some obvious features, such as whether or not the question con¬ 
tains a linketc. We presented the overall statistics of our classified data set in Table 3. 
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5.1 Characterizing the Subjective and Objective Questions 

Given the positive correlation reported between question length and degree of person¬ 
alization in [14], we assume that subjective information-seeking questions on Twitter 
are longer than the objective ones. To examine the difference, we conducted Mann- 
Whitney U test across the question types on character and word scales. 


Table 3. Classification results using the top 500 selected features 


Question Type 

Non- 

informational 

Informational 

Subjeetive 

Objeetive 

Questions 

15,311 

3,984 

6,402 

Questioners 

4,762 

2,267 

3,072 

Answers 

169,690 

44,636 

57,495 

Answerers 

87,331 

28,190 

33,118 


In our data set, information-seeking questions asked on Twitter had an average 
length of 81.47 characters and 14.78 words. With the empirical cumulative distribu¬ 
tion function (ECDF) of the question length plotted in Figure 1, we noticed that both 
the number of characters and words differ across question subjectivity categories. 
Consistent with our hypothesis, in general subjective information-seeking tweets (Me 
= 87, M„ = 15.95) contain more characters and words than the objective ones (Me = 
73, Mn* = 14.05). Mann-Whitney U test further proofed our findings with statistical¬ 
ly significant p-values less than 0.05 (Ze = -17.39, Pe = 0.00 < 0.05; z* = -15.75, p* = 
0.00 < 0.05). Through our further investigation on the content of questions, we noted 
that subjective questions tended to use more words to provide additional contextual 
information about the questioner’s information needs. Examples of such questions 
include: “So after listening to @wittertainment and the Herzog interview I need to see 
more of his work but where to start? Some help @KermodeMovie ?”, and “Thinking 
about doing a local book launch in #ymm any of my tweeps got any ideas?’’ 




Fig. 1. Distribution of question length on character and word levels 
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5.2 Characterizing the Subjective and Objective Answers 

So far, we have only examined the characteristics of subjective and objective infor¬ 
mation-seeking questions posted on Twitter. In this subsection, we presented how the 
subjectivity orientation of a question can affect its response. 

Response Speed 

Considering the real time nature of social Q&A, we first looked at how quickly subjec¬ 
tive and objective information-seeking questions receive their responses. We adopted two 
metrics in this study to measure the response speed: the time elapsed until receiving the 
first answer, and the time elapsed until receiving the last answer. In Figure 2, we plotted 
the empirical cumulative distribution of response time in minutes using both measure¬ 
ments. We log transformed the response time given its logarithmic distribution. 

In our data set, more than 80% of questions posted on Twitter received their first 
answer in 10 minutes or less, no matter their question types (84.60% objective ques¬ 
tions and 83.09% subjective ones). Around 95% of questions got their first answer in 
an hour, and almost all questions were answered within a day. From Figure 5, we 
noticed that it took slightly longer for individuals to answer subjective questions than 
the objective ones. The t-test result also revealed significant difference on the arrival 
time of the first answer between question types (t = -3.08, p < 0.05), with subjective 
questions on average being answered in 4.60 minutes after the question was posted 
and objective questions being answered in 4.24 minutes. We assumed that this might 
because subjective questions were mainly posted during working hours, whereas, 
respondents were more active during free time hours [14]. 

In addition to the first reply, we also adopted the arrival time of the last answer to 
imply the temporality of each question. Define in [15], question temporality is “a 
measure of how long the answers provided on a question are expected to be valuable”. 
Overall, 67.79% of subjective and 69.49% objective questions received their last an¬ 
swer in an hour. More than 96% of questions of both types closed in a day (96.68% 
objective questions and 96.16% subjective ones). Again, the t-test result demonstrated 
significant between-group difference on the arrival time of the last answer (t = 3.76, p 
< 0.05), with subjective questions on average being last answered in 44 minutes after 
the question was posted and objective questions being answered in 38 minutes. Ex¬ 
amples of objective questions with short temporal durations include: “Hey, does any¬ 
one know if Staples & No Frills are open today?" and “When is LFC v Valarenga?” 




Time duration intil the first answer received (tog scale) Time duration until the last answer received (log scale) 


Fig. 2. Distribution of question response time in minutes 
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Characteristics of Respondents 

In addition to the response speed, we were also interested in understanding whether 
the characteristics of a respondent affect his/her tendency to answer a subjective or 
objective question on Twitter. In order to do so, we proposed a number of profile- 
based factors, including: the number of followers, the number of friends, daily tweet 
volume, which is measured as the ratio of the total count of status to the total number 
of days on Twitter, and the friendship between the questioner and the respondent. 
Here, we only categorized questioner-answerer pairs with reciprocal follow relations 
as “friends”, while the rest as “strangers”. 

We crawled the profile information of all respondent in our dataset, as well as their 
friendships with the corresponding questioners via Twitter API. Since our data set 
spanned from March 2010 to February 2014, 2,998 out of 59,856 unique users in our 
collection have either deleted their Twitter accounts or have their accounts set as pri¬ 
vate. So, we were only able to collect the follow relationship between 95% (78,697) 
of the unique questioner-answer pairs in our data set. 

We used logistic regression to test whether any of our proposed factors were inde¬ 
pendently associated with the respondent’s behavior of answering subjective or objec¬ 
tive questions on Twitter. The results of our logistic regression analysis were shown 
in Table 4. 

Table 4. Logistic regression analysis of variables associated with subjective or objective 
question answering behavior 


Predictor 

Odds Ratio 

p-value 

Number of followers 

1.00 

0.24 

Number of friends 

1.00 

0.07 

Daily tweet volume 

0.99 

0.00* 

Friendship 

1.04 

0.03* 


From Table 4, we noticed that among all four variables, the respondent’s daily 
tweet volume and friendship with the questioner were significantly associated with 
his/her choice of answering subjective or objective questions in social Q&A. To better 
understand those associations, we further performed post hoc analyses on those sig¬ 
nificant factors. 

First, as for the friendship between the questioner and the respondent, among all 
78,697 questioner-answerer pairs in our data set, 22,220 (28.23%) of the follow relations 
were reciprocal, 24,601 (31.26%) were one-way and 31,871 (40.51%) were not follow¬ 
ing each other. The number of reciprocal following relations in our collection is relatively 
low, comparing to the 70%-80% and the 36% rates as reported in [16, 17] .We think this 
is because Replyz has created another venue for people to answer other’s questions, even 
if they were not following each other on Twitter, and this enabled us to better understand 
how strangers in social Q&A select and answer questions. 

Besides the overall patterns described, we also conducted chi-square test to exam¬ 
ine the dependency between the questioner-respondent friendship and the answered 
question type. As shown in Table 5, the chi-square cross-tabulations revealed a signif¬ 
icant trend between the two variables = 13.96, p = 0.00 < 0.05). We found that in 
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real-world settings, “strangers” were more likely to answer subjective questions than 
“friends”. This was unexpected given previous work [1] showed that people claimed 
in survey that they prefer to ask subjective questions to their friends for tailored re¬ 
sponds. One reason for this could be that compared to objective questions, subjective 
questions require less expertise and time investment, so that could be a better option 
for strangers to offer their help. 


Table 5. Answered question type by questioner-answerer friendship 


Question Type 

Friendship Type 

Friends 

Strangers 

Subjective 

23.9% (n = 6359 ) 

25.3% (n = 20229) 

Objective 

76.1% (n = 8234) 

74.7% (n = 24355 ) 


In addition in order to examine the relationship between the respondent’s daily 
tweet volume and his/her answered question type, a Mann-Whitney U test was per¬ 
formed. The result was significant (z = -7.87, p = 0.00 < 0.05), with respondents to 
the subjective questions having more tweets posted per day (M = 15.07) than the re¬ 
spondents of the subjective questions (M = 13.24). This result further proved our pre¬ 
sumption in the previous paragraph that individuals with more time spent in social 
platforms are more willing to answer more time consuming questions, in our case, the 
objective ones. 


6 Discussion and Conclusion 

In this work, we distinguished and analyzed 6,402 objective and 3,984 subjective ques¬ 
tions. First, we found that contextual restrictions were imposed more often on subjective 
questions, and thus made them normally longer in length than the objective ones. In 
addition, our results revealed that subjective questions experienced longer time-lags in 
getting their initial answers. Furthermore, we also noticed that it took shorter time for 
the objective questions to receive all their responses. One interpretation of this finding 
could be that many of the objective questions asked on Twitter were about real-time 
content (e.g. when will a game start? where to watch the election debates, etc.) and were 
sensitive to real world events [5], so answers to those questions tended to expire in 
shorter durations[15]. Another possible explanation was that, since answers to the objec¬ 
tive questions were supposed to be less diverse, individuals would quickly stop provid¬ 
ing responses after they saw a satisfactory number of answers already exist to those 
questions. Of course, both speculations need support from future detailed case studies. 
At last, in assessing the preferences of friends and strangers on answering subjective or 
objective questions, we demonstrated that even though individuals prefer to ask subjec¬ 
tive questions to their friends for tailored responds [1], it turned out that, in reality, sub¬ 
jective questions were being responded more by strangers. We thought this gap between 
the ideal and reality imposed a design challenge in maximizing the personalization ben¬ 
efits from strangers in social Q&A. 
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In terms of design implications, we believe that our work contributes to the social 
Q&A field in two ways: First, our predictive model on question subjectivity enables 
automatic detection of subjective and objective information-seeking questions posted 
on Twitter and can be used to facilitate future studies on large scales. Second, our 
analysis results allow the practitioners to understand the distinct intentions behind 
subjective and objective questions, and to build corresponding tools or systems to 
better enhance the collaboration among individuals in supporting social Q&A activi¬ 
ties. For instance, we believe that given the survey nature of subjective questions and 
stranger’s interests in answering them, one could develop an algorithm to route those 
subjective questions to appropriate respondents based on their locations and past ex¬ 
periences. In contrast, considering the factorial nature and short duration of objective 
questions, they could be routed to either search engines or individuals with equivalent 
expertise or availability. In summary, our work is of good value to both research 
community and industrial practice. 
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